Joint Parsing and Alignment with Weakly Synchronized Grammars
نویسندگان
چکیده
Syntactic machine translation systems extract rules from bilingual, word-aligned, syntactically parsed text, but current systems for parsing and word alignment are at best cascaded and at worst totally independent of one another. This work presents a unified joint model for simultaneous parsing and word alignment. To flexibly model syntactic divergence, we develop a discriminative log-linear model over two parse trees and an ITG derivation which is encouraged but not forced to synchronize with the parses. Our model gives absolute improvements of 3.3 F1 for English parsing, 2.1 F1 for Chinese parsing, and 5.5 F1 for word alignment over each task’s independent baseline, giving the best reported results for both Chinese-English word alignment and joint parsing on the parallel portion of the Chinese treebank. We also show an improvement of 1.2 BLEU in downstream MT evaluation over basic HMM alignments.
منابع مشابه
An improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملAlignment Elimination from Adams' Grammars
Adams’ extension of parsing expression grammars enables specifying indentation sensitivity using two non-standard grammar constructs — indentation by a binary relation and alignment. This paper proposes a step-by-step transformation of well-formed Adams’ grammars for elimination of the alignment construct from the grammar. The idea that alignment could be avoided was suggested by Adams but no p...
متن کاملDealing with Spurious Ambiguity in Learning ITG-based Word Alignment
Word alignment has an exponentially large search space, which often makes exact inference infeasible. Recent studies have shown that inversion transduction grammars are reasonable constraints for word alignment, and that the constrained space could be efficiently searched using synchronous parsing algorithms. However, spurious ambiguity may occur in synchronous parsing and cause problems in bot...
متن کاملS4 enriched multimodal categorial grammars are context-free
Bar-Hillel et al. [1] prove that applicative categorial grammars weakly recognize the context-free languages. Buszkowski [2] proves that grammars based on the product-free fragment of the non-associative Lambek calculus NL recognize exactly the contextfree languages. Kandulski [7] furthers this result by proving that grammars based on NL also recognize exactly the context-free languages. Jäger ...
متن کاملDeterministic Shift-Reduce Parsing for Unification-Based Grammars by Using Default Unification
Many parsing techniques including parameter estimation assume the use of a packed parse forest for efficient and accurate parsing. However, they have several inherent problems deriving from the restriction of locality in the packed parse forest. Deterministic parsing is one of solutions that can achieve simple and fast parsing without the mechanisms of the packed parse forest by accurately choo...
متن کامل